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Description 


As a Data engineer working for a firm. You have been given a dataset that contains emails data 
and you need to perform the following tasks: 


Group by the sender. 
List all the unique senders. 
Count all the unique senders. 
Group by sender and count to find out which email address sent the most emails. 
Rank the senders in order or emails sent. 
Rank the senders in order or emails sent + emails received. 
o Hint: You will need to use a project stage to do this. 


Create a timeline that shows the number of emails by day. 
Create a filter that shows the number of emails by day, sent by people who had sent 10 
or more emails. 
o First, prep the data. The date will be stored as a string and you would need to 
convert it to an actual date. 
o We can split the data using a script like this one. See if you can see what it is 
doing: 


db.startups.find({}).forEach (function (startup) { 

if (startup.tag_list && startup.tag list.split) { 

startup.tag list = 
startup.tag list.split(',') .map(function(a) {return a.trim() }); 

} else { 
startup.tag list = []; 

} 

print (startup.tag list); 

db.startups.save (startup) 


}) 


Dataset 


Download the Dataset here: https://bit.ly/314S1uP 
Import it into Mongo using mongoimport, something like this: 


mongoimport --db enron --collection emails --file enron.json 
e Change the appropriate data into a Date from a string like this: 
new Date ("2000-08-23 02:16:00-07:00") 


e lterate over the dataset, converting all the strings into dates and saving them back into 
the database. You will need to use the save command for this. 


